Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Install this application on your home screen for quick and easy access when you’re on the go.
Just tap then “Add to Home Screen”
Member rate £492.50
Non-Member rate £985.00
Save £45 Loyalty discount applied automatically*
Save 5% on each additional course booked
*If you attended our Methods School in the last calendar year, you qualify for £45 off your course fee.
Monday 8 to Friday 12 August 2016
Generally classes are either 09:00-12:30 or 14:00-17:30
15 hours over 5 days
The aim of this course is to offer a detailed but accessible introduction to generalized linear modeling (GLM). Political scientists are often confronted with outcome variables that are not linear, such as survey respondents' choices among two or more options, ordinal survey items, or event counts. GLM is a common technique used to perform regression in these cases. The aim of this course is to make students comfortable with applying GLM techniques to a variety of outcome variables. The course discusses the logic of GLM and maximum likelihood estimation, with applications to binary, ordinal, categorical, and count data. Particular emphasis will be put on interpreting estimates, obtaining quantities of interest, and visualizing the results in a compelling way.
Federico Vegetti is a postdoctoral research fellow in Political Science at CEU. He gained his PhD in Political Science from the University of Mannheim in 2013.
His research interests include political psychology and behaviour, comparative politics, political economy, and quantitative research methods.
Political scientists are often interested in studying outcome variables that are not linear in nature. For instance, scholars may be interested in studying discrete choices among two or more options (e.g. voting or abstaining, choosing party X instead of party Y or Z, etc.) or the number of times a particular event is repeated (e.g. in how many wars a country was involved over a given period of time). In these cases, using OLS regression may produce biased or even meaningless results.
This course is meant to provide an introduction to a usual technique employed to tackle with some common types of non-continuous dependent variables, namely generalized linear modeling (GLM). By reflecting on the type of data generating process behind the observed outcomes, the course aims to make the students comfortable with concepts such as linear predictor, link function, and maximum likelihood. Moreover, by relying on statistical simulation as well as real-world data, the course will provide students with some tools to generate and report quantities of interest obtained via GLM regressions.
The course starts in medias res, discussing the most simple and common example where GLM is needed, namely binary response variables. Students will be encouraged to discuss what potential problems may arise when applying linear regression to dichotomous variables, which assumptions are violated, and why it matters. With this example in mind, the course proceeds with a general introduction to the logic of GLM and to the Maximum Likelihood method to estimate models in the GLM framework. Then, the course focuses on issues arising from the interpretation of the coefficients. In this part, we will discuss some strategies to obtain quantities of interest (i.e. predicted probabilities) and visualize them in a compelling way. This will include the interpretation of interaction terms, and the graphic presentation of interaction effects. In the last two days, the course covers three less common but nevertheless important types of outcome variables: ordinal, categorical, and counts. On the fourth day, ordinal and multinomial logit models are discussed, as a generalization of the framework introduced in the study of binary outcomes. On the fifth day, the focus moves to poisson and negative binomial regression models.
The lab sessions will be based on the open-source statistical software R (www.r-project.org). Because several functions will require the use of additional packages, it is recommended that the students bring their own laptops, so that the download and installation of additional components will proceed smoothly. The lecturer will provide all the datasets necessary for the lab exercises.
The course assumes a basic understanding of descriptive statistics and probability theory (e.g. types of variables, basic statistics, common distributions) and some proficiency with linear regression analysis (i.e. how to interpret an OLS output). Moreover, the course will assume some familiarity with R. Note that this is not an introductory course to R. Although the lecturer will be open to explain and discuss the code used for the exercises, it is assumed that the students can move within the R environment with a certain degree of confidence.
a) Participants should have taken the course on “Multiple Regression Analysis: Estimation and Diagnostics” in the first week of the summer school or have obtained equivalent prior knowledge through other means. (the coruse might have a different title this year, please check)
b) The course relies heavily on the software R: students will be given examples of some rather abstract concepts like ”maximum likelihood” or ”data generating process” by means of statistical simulation. While this can be a very helpful tool, it requires that the students have a basic understanding of the R language. For students unfamiliar with R, a preparatory course will be offered prior to the first week. Otherwise, online resources are plenty. I recommend the tutorial ”Try R”, available online for free (http://tryr.codeschool.com/) and/or the ”Foundations” section of the online tutorial by Hadley Wickham (http://adv-r.had.co.nz/). A good introduction book is “R in a Nutshell – A Desktop Quick Reference” by Joseph Adler (O’Reilly, 2010).
c) Students are expected to understand the logic of inferential statistics. Students familiar with R but in need of a refresher in basic statistics are encouraged to take part in the preparatory course on statistics.
d) The course will use some matrix algebra notation, hence the student should have some familiarity with the logic of matrix algebra.
Day | Topic | Details |
---|---|---|
Monday 1 | Modeling Binary Response Variables: what to do? |
90’ Lecture: Linear probability models, problems and alternatives. 90’ Lab: Simulating a data generating process. |
Tuesday 2 | The general logic of GLM and Maximum likelihood |
90’ Lecture: Distributions, link functions and maximum likelihood estimation. 90’ Lab: More on logit models, inside ML estimation. |
Wednesday 3 | Interpreting coefficients of logit models, quantities of interest, and interactions |
90’ Lecture: Interpreting logit models and interactions 90’ Lab: Simulation-based approaches for quantities of interest |
Thursday 4 | Modeling categorical and ordinal variables |
90’ Lecture: Multinomial and ordered logistic regression 90’ Lab: Specifying and testing multinomial and ordered logit models |
Friday 5 | Modeling counts |
90’ Lecture: Poisson and negative binomial regression 90’ Lab: Specifying and testing poisson and negative binomial models |
Day | Readings |
---|---|
Monday 1 |
- Fox (2008). Ch 14, Logit and Probit Models for Categorical Response Variables - Fox (2008). Ch. 15, Generalized Linear Models, pp. 379-387 (http://www.sagepub.com/upm-data/21121_Chapter_15.pdf) |
Tuesday 2 |
- Enders (2010). Ch 3, An Introduction to Maximum Likelihood Estimation. - Eliason (1993). Maximum Likelihood Estimation. Logic and Practice, pp. 1-28. - King (1998). Ch. 1, 2 |
Wednesday 3 |
- Brambor et al. (2006). Understanding Interaction Models: Improving Empirical Analyses - Braumoeller (2004) - Hypothesis testing and multiplicative interaction terms - Berry et al. (2010). Testing for Interaction in Binary Logit and Probit Models: Is a Product Term Essential? - Berry et al. (2012). Improving Tests of Theories Positing Interaction |
Thursday 4 |
- Long (1997). Ch 5. Ordinal Outcomes - Long (1997). Ch 6. Nominal Outcomes |
Friday 5 |
- Fox (2008). Ch. 15, Generalized Linear Models, pp. 387-394 (http://www.sagepub.com/upm-data/21121_Chapter_15.pdf) - Long (1997). Ch 8. Count Outcomes - Benoit (1996). Democracies Really Are More Pacific (in general) |
R, RStudio (highly recommended)
https://www.rstudio.com/products/rstudio/
Participants need to bring their own laptop with software installed.
Books:
Fox, J., 2008. Applied Regression Analysis and Generalized Linear Models, Sage. (Ch. 14, 15)
Eliason, 1993. Maximum Likelihood Estimation. Logic and Practice. Sage. (Ch. 1, 2)
Enders, C.K., 2010. Applied Missing Data Analysis. Guilford Press. (Ch. 3 – “An Introduction to Maximum Likelihood Estimation” offers a clear and intuitive discussion of ML)
King, G. 1998. Unifying Political Methodology. University of Michigan Press. (Ch. 1, 2 for a conceptual discussion of the inferential logic and the likelihood. Ch. 3, 4, 5 are optional, but recommended)
Long, J. Scott, 1997. Regression Models for Categorical and Limited Dependent Variables. Sage. (Ch. 5, 6, 8)
Books about R:
Adler, J., 2010. R in a Nutshell – A Desktop Quick Reference, O'Reilly. (A general introduction to R, we will take some examples from the book in the lab sessions)
Articles:
Benoit, K., 1996. Democracies Really Are More Pacific (in general). Journal of Conflict Resolution.
Berry, W.D., DeMeritt, J.H.R., and Esarey, J., 2010. Testing for Interaction in Binary Logit and Probit Models: Is a Product Term Essential? American Journal of Political Science.
Berry, W.D., Golder, M., and Milton, D., 2012. Improving Tests of Theories Positing Interaction. Journal of Politics.
Brambor, T., Clark, W. R., Golder, M., 2006. Understanding Interaction Models: Improving Empirical Analyses, Political Analysis.
Braumoeller, B.F., 2004. Hypothesis testing and multiplicative interaction terms. International Organization.
Further readings:
Fitzmaurice, G.M, Laird, N.M., Ware, J.H., 2004. Applied Longitudinal Analysis. Wiley. (Ch. 10 – “Review of Generalized Linear Models” is yet another reading explaining the logic of GLMs, like the Fox chapter. You don't have to read it all – definitely skip the SAS part – but it might be useful to hear the same concepts repeated in a different context)
Introduction to R
Introduction to Multivariate Linear Regression